Dean Adams, Iowa State University
26 May, 2020
Morphometricians wish to explain the evolution of phenotypic diversity
How do we characterize patterns, and hypothesize processes in an evolutionary context?
Understanding morphological trends requires:
1: Methods for quantifying morphology (morphometrics)
2: Statistical models & tools for testing hypotheses in a phylogenetic framework
GMM: A mathematical merger of statistical shape theory & analytics for extracting and analyzing shape from landmark data (via GPA)
Procedures generate a set of shape variables for subsequent analyses*
* Note: considerable statistical theory was required to derive appropriate analytics for such data
GMM: A mathematical merger of statistical shape theory & analytics for extracting and analyzing shape from landmark data (via GPA)
Procedures generate a set of shape variables for subsequent analyses*
* Note: considerable statistical theory was required to derive appropriate analytics for such data
How can we leverage GMM to address macroevolutionary hypotheses?
Trait correlations often used to study coevolution and adaptation
The problem? Species are not independent of one another
One must account for this via phylogenetic comparative methods
Let’s simulate phenotypic traits on a phylogeny under Brownian motion:
Closely related species tend to resemble one another (i.e., \(\small{E(\Sigma_{T1,T2})>0}\))
Traits appear correlated, though simulated independently
Traits appear correlated, though simulated independently
Conclusion: We must account for phylogeny during comparative analyses
PCMs condition the data on the phylogeny during the analysis
Empirical Goal: Evaluate evolutionary hypotheses while accounting for (phylogenetic) non-independence
Requires an evolutionary model of how trait variation is expected to accumulate
A useful null model of trait change \(\small\Delta\mu=0\) but \(\small\sigma^2\uparrow\propto{time}\)
Side-note: this is the continuous-trait model equivalent of the Markov process, and is intimately related to Gaussian theory and the normal distribution
see Felsenstein (1973; 1985)
The conceptual development of PCMs
The methods of PCM: Analytical tools to address particular evolutionary hypotheses
Today we will focus on Phylogenetic Linear Models (regression/ANOVA)
Most PCMs use GLS (generalized least squares) as a model:
\(\small\mathbf{V}\) is the phylogenetic covariance matrix: it describes the amount of evolutionary time species share via common ancestry (and thus how similar they are expected to be)
Sometimes V is called C (particularly when derived under Brownian motion)
One analytical solution can be appreciated by the following:
One analytical solution can be appreciated by the following:
From the circles above, one can envision a set of contrasts between sister taxa that are independent given the phylogeny (see Felsenstein 1985)
Estimate contrast scores between pairs of taxa (tips or nodes)
Use contrasts for analyses (OLS solution)
Coefficients found as: \(\small\hat{\mathbf{\beta }}=\left ( \mathbf{X}^{T}_{pic} \mathbf{X}_{pic}\right )^{-1}\left ( \mathbf{X}^{T}_{pic} \mathbf{Y}_{pic}\right )\)
see Felsenstein. Am. Nat. (1985)
What is the association between Y and X?
## Analysis of Variance Table
##
## Response: pic.y
## Df Sum Sq Mean Sq F value Pr(>F)
## pic.x 1 14.35 14.35 19.3 0.0046
## Residuals 6 4.47 0.74
## pic.x
## 0.885
Alternative implementation: Use GLS model with non-independent error structure
Statistical model: \(\small\mathbf{Y}=\mathbf{X}\mathbf{\beta } +\mathbf{E}\) where \(\small\mathbf{E} \sim \mathcal{N}(0,\textbf{V})\)
\(\textbf{V}\) is the phylogenetic covariance matrix
Coefficients found as: \(\small\hat{\mathbf{\beta }}=\left ( \mathbf{X}^{T} \mathbf{V}^{-1} \mathbf{X}\right )^{-1}\left ( \mathbf{X}^{T} \mathbf{V}^{-1}\mathbf{Y}\right )\)
*Note: results identical to PIC (if implemented correctly)
see Grafen. Phil. Trans. Roy. Soc. (1989)
## Denom. DF: 6
## numDF F-value p-value
## (Intercept) 1 12.9 0.0115
## X 1 19.3 0.0046
## (Intercept) X
## -2.330 0.885
Identical results to PICs!
Alternative implementation: Utilize OLS transformation of GLS models
Phylogenetic transformation matrix: \(\small\mathbf{T}=\left ( \mathbf{U}\mathbf{W}^{-1/2} \mathbf{U}^{T}\right )^{-1}\)
U and W are eigenvectors and eigenvalues of V
Transformed data: \(\small\tilde{\mathbf{Y}}=\mathbf{TY}\)
Transformed design matrix: \(\small\tilde{\mathbf{X}}=\mathbf{TX}\)
Coefficients solved as: \(\small\hat{\mathbf{\beta }}=\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{X}}\right )^{-1}\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{Y}}\right )\)
*Note: results identical to PIC & PGLS (if implemented correctly)
see Garland and Ives. Am. Nat. (2000); Adams. Evol. (2014); Adams and Collyer. Evol. (2018)
## Df SS MS Rsq F Z Pr(>F)
## X 1 14.35 14.35 0.763 19.3 1.82 0.002
## Residuals 6 4.47 0.74 0.237
## Total 7 18.82
## (Intercept) X
## -2.330 0.885
Identical results to PICs & PGLS!
PIC, PGLS, and Phylogenetic transform yield identical parameter estimates
\[\tiny \hat{\mathbf{\beta }}=\left ( \mathbf{X}^{T} \mathbf{V}^{-1} \mathbf{X}\right )^{-1}\left ( \mathbf{X}^{T} \mathbf{V}^{-1}\mathbf{Y}\right )\]
\[\tiny =\left ( \mathbf{X}^{T}_{pic} \mathbf{X}_{pic}\right )^{-1}\left ( \mathbf{X}^{T}_{pic} \mathbf{Y}_{pic}\right )\]
\[\tiny =\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{X}}\right )^{-1}\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{Y}}\right )\]
Model significance typically accomplished using parametric procedures (compare reduced [null] model to full model via):
F-ratios compared to parametric F-distribution
Compare \(\log{L}\) for the two models
\[\tiny \log{L}=\log{ \left(\frac{exp{\left({-\frac{1}{2}{\left({\mathbf{Y}-E(\mathbf{Y})}\right)^{T}}} \mathbf{V}^{-1}\left({\mathbf{Y}-E(\mathbf{Y})}\right)\right)}} {\sqrt{\left({2\pi}\right)^{Np}\times\begin{vmatrix} \mathbf{V} \end{vmatrix}}}\right)}\]
PROBLEM: Parametric significance testing suffers from Rao’s paradox
Power reduces as p-dimensions increase
Why does this happen?
Adams. Evol. (2014); Adams and Collyer. Syst. Biol. (2018); Adams and Collyer. Ann. Rev. Ecol. Evol. Syst. (2019)
One can understand this result by appreciating the likelihood equation:
\[\tiny \log{L}=\log{ \left(\frac{exp{\left({-\frac{1}{2}{\left({\mathbf{Y}-E(\mathbf{Y})}\right)^{T}}} \mathbf{V}^{-1}\left({\mathbf{Y}-E(\mathbf{Y})}\right)\right)}} {\sqrt{\left({2\pi}\right)^{Np}\times\begin{vmatrix} \mathbf{V} \end{vmatrix}}}\right)}\]
We see there are three components to \(\small\log{L}\):
| Element | Component | Meaning |
|---|---|---|
| 1 | \(\tiny\frac{Np}{2}log{2\pi}\) | A constant |
| 2 | \(\tiny\frac{1}{2}{\left(\mathbf{Y}-E(\mathbf{Y})\right)^{T}\mathbf{V}^{-1}\left(\mathbf{Y}-E(\mathbf{Y})\right)}\) | Summed-squared deviations from the predicted values in some fashion |
| 3 | \(\tiny\frac{1}{2}log\begin{vmatrix} \mathbf{V} \end{vmatrix}\) | The error covariance of the model |
Surely #2 & #3 vary across different models? … WRONG!!
Let’s look at: \(\tiny\frac{1}{2}{\left(\mathbf{Y}-E(\mathbf{Y})\right)^{T}\mathbf{V}^{-1}\left(\mathbf{Y}-E(\mathbf{Y})\right)}\) (univariate OLS example for simplicity):
\(\tiny\tiny\mathbf{Y}=\begin{bmatrix} 6\\ 7\\ 4\\ 5\\ 9\\ 3\\ 2\\ 5 \end{bmatrix}\) \(\tiny\tiny\mathbf{X_{0}}=\begin{bmatrix} 1\\ 1\\ 1\\ 1\\ 1\\ 1\\ 1\\ 1 \end{bmatrix}\) \(\tiny\tiny\mathbf{X_{1}}=\begin{bmatrix} 1 & 1\\ 1& 3\\ 1& 3\\ 1& 4\\ 1& 5\\ 1& 6\\ 1& 7\\ 1& 8 \end{bmatrix}\)
beta<-solve(t(x)%*%x)%*%t(x)%*%y
sigma<-(t(y-x%*%beta)%*%(y-x%*%beta) ) / N
(t(y-x%*%beta)%*%(y-x%*%beta)) / (2*sigma)## [,1]
## [1,] 4
beta1<-solve(t(x)%*%x)%*%t(x)%*%y
sigma1<-(t(y-x%*%beta1)%*%(y-x%*%beta1) ) / N
(t(y-x%*%beta1)%*%(y-x%*%beta1)) / (2*sigma1)## [,1]
## [1,] 4
Wait! What? For two different models this component is IDENTICAL!
ALGEBRA, please provide us with wisdom…
And algebra answers…“Yes I will!”
Say \(\small\mathbf{X}\) is a vector of ones and \(\small\mathbf{Y}\) is univariate. Then for OLS models \(\small\mathbf{X\beta}\) is a vector of mean \(\small\mu{Y}\)
Thus, \(\small{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}\) is \(\small\mathbf{SS}_{\epsilon}\)
And algebra answers…“Yes I will!”
Say \(\small\mathbf{X}\) is a vector of ones and \(\small\mathbf{Y}\) is univariate. Then for OLS models \(\small\mathbf{X\beta}\) is a vector of mean \(\small\mu{Y}\)
Thus, \(\small{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}\) is \(\small\mathbf{SS}_{\epsilon}\)
Thus we have: \(\small\frac{1}{2\sigma^{2}}{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}=\frac{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}{2\frac{\mathbf{E}^{T}\mathbf{E}}{n}}\)
And algebra answers…“Yes I will!”
Say \(\small\mathbf{X}\) is a vector of ones and \(\small\mathbf{Y}\) is univariate. Then for OLS models \(\small\mathbf{X\beta}\) is a vector of mean \(\small\mu{Y}\)
Thus, \(\small{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}\) is \(\small\mathbf{SS}_{\epsilon}\)
Thus we have: \(\small\frac{1}{2\sigma^{2}}{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}=\frac{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}{2\frac{\mathbf{E}^{T}\mathbf{E}}{n}}\)
Re-arranging gives us \(\small=\frac{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}{\frac{2}{n}\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}=\frac{n}{2}\)
REGARDLESS of the data in Y \(\small\frac{1}{2\sigma^{2}}{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}\) is the constant: \(\small\frac{n}{2}\)
This result holds multivariate and GLS models: where the general solution is: \(\small\frac{np}{2}\).
Our likelihood thus contains:
\[\tiny \log{L}=\log{ \left(\frac{exp{\left({-\frac{1}{2}{\left({\mathbf{Y}-E(\mathbf{Y})}\right)^{T}}} \mathbf{V}^{-1}\left({\mathbf{Y}-E(\mathbf{Y})}\right)\right)}} {\sqrt{\left({2\pi}\right)^{Np}\times\begin{vmatrix} \mathbf{V} \end{vmatrix}}}\right)}\]
| Element | Component | Meaning |
|---|---|---|
| 1 | \(\tiny\frac{Np}{2}log{2\pi}\) | A constant |
| 2 | \(\tiny\frac{1}{2}{\left(\mathbf{Y}-E(\mathbf{Y})\right)^{T}\mathbf{V}^{-1}\left(\mathbf{Y}-E(\mathbf{Y})\right)}\) | A constant |
| 3 | \(\tiny\frac{1}{2}log\begin{vmatrix} \mathbf{V} \end{vmatrix}\) | The error covariance of the model |
Thus, for parametric statistical hypothesis testing, all logL estimates (and by consequence derived summary measures such as AIC) are based on ONE component that is free to vary across models, \(\small\frac{1}{2}log\begin{vmatrix} \mathbf{V} \end{vmatrix}\)!
And this component becomes singular as \(N\rightarrow{p}\)!
Our likelihood thus contains:
\[\tiny \log{L}=\log{ \left(\frac{exp{\left({-\frac{1}{2}{\left({\mathbf{Y}-E(\mathbf{Y})}\right)^{T}}} \mathbf{V}^{-1}\left({\mathbf{Y}-E(\mathbf{Y})}\right)\right)}} {\sqrt{\left({2\pi}\right)^{Np}\times\begin{vmatrix} \mathbf{V} \end{vmatrix}}}\right)}\]
| Element | Component | Meaning |
|---|---|---|
| 1 | \(\tiny\frac{Np}{2}log{2\pi}\) | A constant |
| 2 | \(\tiny\frac{1}{2}{\left(\mathbf{Y}-E(\mathbf{Y})\right)^{T}\mathbf{V}^{-1}\left(\mathbf{Y}-E(\mathbf{Y})\right)}\) | A constant |
| 3 | \(\tiny\frac{1}{2}log\begin{vmatrix} \mathbf{V} \end{vmatrix}\) | The error covariance of the model |
Thus, for parametric statistical hypothesis testing, all logL estimates (and by consequence derived summary measures such as AIC) are based on ONE component that is free to vary across models, \(\small\frac{1}{2}log\begin{vmatrix} \mathbf{V} \end{vmatrix}\)!
And this component becomes singular as \(N\rightarrow{p}\)!
We need an alternative approach
Forgo standard ML and parametric approaches for statistical evaluation, and use robust methods
New GMM/PCM Approach:
1: Condition data on phylogeny & fit model parameters
2: Obtain robust summary measures (avoid \(\begin{vmatrix} \mathbf{V} \end{vmatrix} = 0\))
3: Evaluate significance and effect sizes NOT using logL
We utilize the phylogenetic transformation procedure above
Phylogenetic transformation matrix: \(\small\mathbf{T}=\left ( \mathbf{U}\mathbf{W}^{-1/2} \mathbf{U}^{T}\right )^{-1}\)
U and W are eigenvectors and eigenvalues of V
Transformed data: \(\small\tilde{\mathbf{Y}}=\mathbf{TY}\)
Transformed design matrix: \(\small\tilde{\mathbf{X}}=\mathbf{TX}\)
Coefficients solved as: \(\small\hat{\mathbf{\beta }}=\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{X}}\right )^{-1}\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{Y}}\right )\)
see Adams. Evol. (2014); Adams and Collyer. Evol. (2018)
Leverage geometry to obtain robust summary statistics
One way: Sums-of-squares from object distances*
Avoids \(\begin{vmatrix} \mathbf{V} \end{vmatrix} = 0\), but still obtains: \(SS\), \(MS\), \(F\), \(R^2\), etc.
*Note: approach also used for Goodall’s F-test
See: Adams (2014a); Adams (2014b); Adams (2014c); Adams and Collyer (2015); (2018a, 2018b, 2019)
Does head shape covary with body size among Plethodon salamander species?
## Df SS MS Rsq F Z Pr(>Cohen's f-squared)
## svl 1 0.00066 0.000659 0.07 3.03 2.02 0.012
## Residuals 40 0.00870 0.000217 0.93
## Total 41 0.00936
YES! There is a significant association between head shape and body size in salamanders
Phylogenetic transformation + RRPP provides analytical generalization of PCMs for high-dimensional data
All manner of linear models can be envisioned: phylogenetic regression, ANOVA, factorial models, etc.
Provides analytical tool for evaluating macroevolutionary trends of shape change in a phylogenetic context
Sets the stage for the development of other PCMMs (perhaps the subject of future discussions)